[executorch] Propagate device metadata from partitioner result onto TensorSpecs#18078
[executorch] Propagate device metadata from partitioner result onto TensorSpecs#18078Gasoonjia merged 15 commits intogh/gasoonjia/135/basefrom
Conversation
…ensorSpecs Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18078
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 2 Unrelated FailuresAs of commit 4ae5949 with merge base 0bf3c51 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
| if lowered_module is None: | ||
| continue | ||
|
|
||
| result = _get_target_device_from_compile_specs(lowered_module) |
There was a problem hiding this comment.
This effectively assumes that we know the device 'name' AoT. In theory, we can have a multi-device delegate then the runtime might interpret this name differently and that can cause some confusion i.e cuda:0 device on Metal.
I am not sure about using generic names like 'gpu' but also not sure about following PyTorch's eager/jit style naming convention where you won't switch devices underneath.
There was a problem hiding this comment.
May I have your suggestions on the executorch device name?
Currently we set up the device name AOT and intentionally decouple dour device attribute with pytorch/pytorch device concept; we created a enum in the etensor schema for all devices we are supporting right now. In this way we can support as much as device as we want.
For the situaton you mentioned, if other backend like vulken need its own gpu device, they should add a new one to the enum. We should avoid using generic names like 'gpu'.
There was a problem hiding this comment.
Multi-device graph serialization will necessitate multiple graphs. We can maybe make an exception for input tensors, but for any intermediate the runtime needs to know what the device its loading intermediates onto.
Device is fixed at export aot. If you want to have some generic shader style lib where the gpu type is decided lazily then you will have to use a generic key like gpu.
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
… onto TensorSpecs Pull Request resolved: #18078 Annotate the delegate's input and output tensors as specific device type The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. ghstack-source-id: 352045003 @exported-using-ghexport Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
… onto TensorSpecs Pull Request resolved: #18078 Annotate the delegate's input and output tensors as specific device type The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device with correct device index. ghstack-source-id: 363318415 @exported-using-ghexport Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/)
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
| return device_type, device_index | ||
|
|
||
|
|
||
| def _get_lowered_module( |
There was a problem hiding this comment.
nit: this type of util really should be placed in a single spot. There are other things like this in the passes. Lets take it as a follow up to have claude just search for generic utils like this and centralize them
| device_index: int = 0, | ||
| ) -> None: | ||
| """Set the device attribute on a TensorSpec.""" | ||
| spec.device = device_type |
There was a problem hiding this comment.
Are these fields already in the TensorSpec class definition? Are they initialized to just cpu and 0?
| for node in graph_module.graph.nodes: | ||
| if node.op == "call_function" and node.target == executorch_call_delegate: | ||
| lowered_module = _get_lowered_module(graph_module, node) | ||
| if lowered_module is None: |
There was a problem hiding this comment.
We should throw here no?
There was a problem hiding this comment.
Let me throw here. I don't think it will be None.
| continue | ||
|
|
||
| result = _get_target_device_from_compile_specs(lowered_module) | ||
| if result is None: |
There was a problem hiding this comment.
Why does it not return cpu by default
There was a problem hiding this comment.
The default value for every tensor is cpu:0. If backend author didn't set the device-related compile spec, every tensor should be remain default value and no need to reset as cpu
|
|
||
| # Second pass: propagate device through getitem nodes that extract | ||
| # individual outputs from a delegate call. | ||
| for node in graph_module.graph.nodes: |
There was a problem hiding this comment.
Can we just do 1 pass. You can look at users of the delegate node to find the getitem nodes.
There was a problem hiding this comment.
Yes we can but i feel like two passes are more structural: one is specific for delegate input and the other is for delegate output
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…sult onto TensorSpecs" Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph The overall pipeline is: a. Partitioner use `compile_spec` to determine which device the partitoned blob is runing on b. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device. Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) [ghstack-poisoned]
…ized Tensor (#18079) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #18080 * #18328 * __->__ #18079 * #18078 Propagate device information from `TensorSpec.device` (set by `PropagateDevicePass`) to the serialized `schema.Tensor` in the emitted PTE file, to make runtime further aware of it. Differential Revision: [D95899706](https://our.internmc.facebook.com/intern/diff/D95899706/)
…ensorSpecs (#18893) This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #18078 by @Gasoonjia ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/gasoonjia/135/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/135/head Merge bot PR base: https://github.com/pytorch/executorch/tree/main Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/gasoonjia/135/orig Differential Revision: [D95842511](https://our.internmc.facebook.com/intern/diff/D95842511/) @diff-train-skip-merge Co-authored-by: gasoonjia <gasoonjia@icloud.com>
Stack from ghstack (oldest at bottom):
Add end-to-end device type annotation support from export to runtime. Currently we only support one device per graph
The overall pipeline is:
a. Partitioner use
compile_specto determine which device the partitoned blob is runing onb. after lowered partitioned graph to backend, the new-introed propagate_device_pass will annotate the input and output tensors of delegate blob as target device.
Differential Revision: D95842511